10 research outputs found

    A Hierarchical Model of Web Summaries

    Get PDF
    We investigate the relevance of hierarchical topic models to represent the content of Web gists. We focus our attention on DMOZ, a popular Web directory, and propose two algorithms to infer such a model from its manually-curated hierarchy of categories. Our first approach, based on information-theoretic grounds, uses an algorithm similar to recursive feature selection. Our second approach is fully Bayesian and derived from the more general model, hierarchical LDA. We evaluate the performance of both models against a flat 1-gram baseline and show improvements in terms of perplexity over held-out data.

    Enabling Interoperability For Autonomous Digital Libraries : An API To CiteSeer Services

    No full text
    We introduce CiteSeer-API, a public API to CiteSeer-like services. CiteSeer-API is SOAP/WSDL based and allows for easy programatical access to all the specific functionalities offered by CiteSeer services, including full text search of documents and citations and citation-based document discovery. CiteSeer-API is currently showcased on SMEALSearch [10], a digital library search engine for business academic publications

    A Service-Oriented Architecture for Digital Libraries

    No full text
    CiteSeer is currently a very large source of meta-data information on the World Wide Web (WWW). This meta-data is the key material for the Semantic Web. Still, CiteSeer is not yet a Semantic-enabled service and therefore its meta-data, although potentially usable by Semantic Web agents, is not yet reachable using the Semantic Web mechanisms. The complexity of CiteSeer, that is the range of tasks it supports, make the transition to a Semantic-enabled service a non-trivial task. While human users tend to perceive CiteSeer as a single well-integrated service, we believe it is best seen -- from a machine perspective -- as a collection of services, each service performing a specific task. In this paper we show our approach to enable CiteSeer on the Semantic Web in order to allow the use of its meta-data through the Semantic Web. We first introduce an intuitive Application Programming Interface (API) to the CiteSeer software, then show that an efficient integration of CiteSeer in the Semantic Web can be best achieved by independently integrating the services that comprise it. We believe the effort presented here towards the Semantic-integration of a complex Information Retrieval system could be used as an integration model for arbitrary systems

    Citeseer-api: towards seamless resource location and interlinking for digital libraries

    No full text
    We introduce CiteSeer-API, a public API to CiteSeer-like services. CiteSeer-API is SOAP/WSDL based and allows for easy programmatical access to all the specific functionalities offered by CiteSeer services, including full text search of documents and citations and citation-based document discovery. In order to enable operability and interlinking with arbitrary software agents and digital library systems, CiteSeer-API uses digital content signatures to create system-independent handles for the Document, Citation and Group resources of CiteSeer servers. We discuss specific functionalities of CiteSeer-API that take advantage of these handlers in order to enable seamless location of CiteSeer resources. Finally we argue that the digital signature scheme used by CiteSeer-API is well suited for the creation of machine-usable semantic descriptions of digital library services which is the key toward seamless discovery and integration of services such as CiteSeer-API. CiteSeer-API is currently showcased on CiteSeer.IST, the CiteSeer server of the School o

    eBizSearch: A Niche Search Engine for e-Business

    No full text
    Niche Search Engines offer an efficient alternative to traditional search engines when the results returned by general-purpose search engines do not provide a sufficient degree of relevance. By taking advantage of their domain of concentration they achieve higher relevance and offer enhanced features. We discuss a new niche search engine, eBizSearch, based on the technology of CiteSeer and dedicated to e-business and e-business documents. We present the integration of CiteSeer in the framework of eBizSearch and the process necessary to tune the whole system towards the specific area of e-business. We also discuss how using machine learning algorithms we generate metadata to make eBizSearch Open Archives compliant. eBizSearch is a publicly available service and can be reached at [3]
    corecore